Search CORE

10 research outputs found

Incremental dimension reduction of tensors with random index

Author: B Emruli
Blerim Emruli
D Achlioptas
DM Kane
E Velldal
Fredrik Sandin
I Fronza
J Karlgren
J Matoušek
K Lund
M Baroni
M Berry
M Sahlgren
M Wan
Magnus Sahlgren
MWM Boyd
N Goel
N Halko
P Frankl
P Kanerva
P Kanerva
PD Turney
RG Baraniuk
S Dasgupta
S Deerwester
Science Staff
SS Vempala
T Cohen
T Cohen
TG Kolda
TK Landauer
V Vasuki
W Johnson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/03/2011
Field of study

We present an incremental, scalable and efficient dimension reduction technique for tensors that is based on sparse random linear coding. Data is stored in a compactified representation with fixed size, which makes memory requirements low and predictable. Component encoding and decoding are performed on-line without computationally expensive re-analysis of the data set. The range of tensor indices can be extended dynamically without modifying the component representation. This idea originates from a mathematical model of semantic memory and a method known as random indexing in natural language processing. We generalize the random-indexing algorithm to tensors and present signal-to-noise-ratio simulations for representations of vectors and matrices. We present also a mathematical analysis of the approximate orthogonality of high-dimensional ternary vectors, which is a property that underpins this and other similar random-coding approaches to dimension reduction. To further demonstrate the properties of random indexing we present results of a synonym identification task. The method presented here has some similarities with random projection and Tucker decomposition, but it performs well at high dimensionality only (n>10^3). Random indexing is useful for a range of complex practical problems, e.g., in natural language processing, data mining, pattern recognition, event detection, graph searching and search engines. Prototype software is provided. It supports encoding and decoding of tensors of order >= 1 in a unified framework, i.e., vectors, matrices and higher order tensors.Comment: 36 pages, 9 figure

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Luleå University of Technology Publications

Detecting modification of biomedical events using a deep parsing approach

Author: A Copestake
A Copestake
A Copestake
A Frank
A MacKinlay
Andrew MacKinlay
B Medlock
C Pollard
D Flickinger
David Martinez
E Briscoe
E Buyko
E Velldal
G Móra
H Kilicoglu
H Uszkoreit
I Solt
J Björne
J Hakenberg
JD Kim
KB Cohen
P Adolphs
R Farkas
S Van Landeghem
Timothy Baldwin
U Callmeier
V Vincze
WW Chapman
Y Tsuruoka
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background This work describes a system for identifying event mentions in bio-molecular research abstracts that are either speculative (e.g. <it>analysis of IkappaBalpha phosphorylation</it>, where it is not specified whether phosphorylation did or did not occur) or negated (e.g. <it>inhibition of IkappaBalpha phosphorylation</it>, where phosphorylation did <it>not </it>occur). The data comes from a standard dataset created for the BioNLP 2009 Shared Task. The system uses a machine-learning approach, where the features used for classification are a combination of shallow features derived from the words of the sentences and more complex features based on the semantic outputs produced by a deep parser. Method To detect event modification, we use a Maximum Entropy learner with features extracted from the data relative to the trigger words of the events. The shallow features are bag-of-words features based on a small sliding context window of 3-4 tokens on either side of the trigger word. The deep parser features are derived from parses produced by the English Resource Grammar and the <it>RASP </it>parser. The outputs of these parsers are converted into the Minimal Recursion Semantics formalism, and from this, we extract features motivated by linguistics and the data itself. All of these features are combined to create training or test data for the machine learning algorithm. Results Over the test data, our methods produce approximately a 4% absolute increase in F-score for detection of event modification compared to a baseline based only on the shallow bag-of-words features. Conclusions Our results indicate that grammar-based techniques can enhance the accuracy of methods for detecting event modification.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Melbourne Institutional Repository